“This crisis is in the process of annihilating wealth on a gigantic scale through no fault of the real economy or industrialists and without any sensible measures being visible to control a sector of the financial economy which is run and lauded by just a handful of people.
[…] this new stock market and financial mentality knows only one objective – that is to make money, more money and even more money, as much as possible, whatever the costs. This behavior has a very destructive effect on industry.”
— Nicolas Hayek, 5 September 2008
When Nicolas Hayek made this statement in a speech to the Swiss Business Federation (Economiesuisse) on 5 September 2008, the world was in the midst of the Global Financial Crisis 2007-2009. Arguably, the statement is rather negative towards financial markets. Hence, one might wonder whether such a public statement by the CEO of one of the world’s most successful watch companies may have led to a reaction from financial markets and banks. After all, SWATCH stocks are publicly listed. On the other hand, bankers and financial market participants may have had other worries due to the Global Financial Crisis. In general, it is an interesting question whether public speeches by well-known and influential business men have an effect on financial markets. 1
#
The idea of this R notebook is to introduce everyone
interested in data science to effectively
communicate data analytics by creating clear and engaging
visualisations. Creating engaging visualisations and
telling a story around a statistical analysis make such data analysis
much more memorable and enjoyable for the audience seeing them – be it
for your team or boss at work, for customers, for your school or
research project, for a blog or newspaper article, for the general
public or simply for your friends. Usually, around 95% of time spent on
data analysis and coding tasks and only around 5% on visualising
results. But what your audience often effectively sees is rather 1%
analysis or code and 99% visuals (or so and maybe some boring text ;)).
Engaging visuals and storytelling are key when it comes to presenting
data analysis to an audience. On the side, we also take a look at a way
to analyse the influence of Nicolas Hayek’s public
speeches on SWATCH’s stock price. For the
purpose of visualising analyses and findings, the ggplot
and plotly packages (as well as some additional packages)
are used since they enable producing high-quality, publication-ready
visualisations and are easy to handle. Both packages are built around
the framework of the so-called Grammar of Graphics, a
scientific syntax for effective data visualisations, which describes how
specific elements of a plot should be named for a structured approach to
visualisations. For more information, see Hadley Wickham
(2010) - A Layered Grammar of Graphics and Wilkinson
(2011) - The Grammar of Graphics.
I can also greatly recommend these following resources:
#
To fully understand this R notebook, some
base R and tidyverse synatx is beneficial.
Otherwise, it should nonetheless be possible to reproduce and adjust the
graphs and steps by simply copying the provided R code and slightly
adjusting it. The additionally referenced resources and R packages above
and throughout the notebook may also be of help. Moreover, knowledge of
finance and statistics greatly helps
to easily understand the content.
#
All programming code can be shown or hidden in the upper
right corner of the notebook or, alternatively, by clicking the
code button present in each cell.
#
# Turn off warning messages
options(warn = -1)
# Custom function for checking installation of packages and loading them
install_and_load_package <- function(package) {
# Check whether package is already installed and if not, install it
if (!require(package, character.only = T)) {
install.packages(package, dependencies = T)
}
# Load specified package
require(package, character.only = T)
}
# Specify packages needed for analysis in character vector
packages <- c("conflicted",
"gapminder",
"httr",
"quantmod",
"tidyverse",
"lubridate",
"tsbox",
"tidytext",
"ggrepel",
"plotly",
"viridis",
"viridisLite",
"RColorBrewer")
# Install and load needed packages
lapply(packages, install_and_load_package)
# Conflicted: hierarchy in case of conflict
conflict_prefer("filter", "dplyr")
conflict_prefer("select", "dplyr")
conflict_prefer("first", "dplyr")
conflict_prefer("last", "dplyr")
conflict_prefer("lag", "dplyr")
conflict_prefer("flatten", "purrr")
conflict_prefer("layout", "plotly")
# Color settings
palette(viridis(n = 10))
col_palette_red <- brewer.pal(n = 9, name = "OrRd")
col_palette_yellow <- brewer.pal(n = 9, name = "YlOrRd")
col_palette_green <- brewer.pal(n = 9, name = "YlGn")
col_palette_blue <- brewer.pal(n = 9, name = "PuBu")
col_palette_grey <- brewer.pal(n = 9, name = "Greys")
Information is valued highly in financial markets and getting accurate publicly- and openly-offered data is not easy. However, there are internet website such as Yahoo Finance, where finanial market data is publicly downloadable. We thus start by gathering publicly available stock price data and data on Nicolas Hayek’s speeches. Subsequently, we transform the data and then start to create a few graphs. Lastly, we answer our question whether Mr. Hayek’s speeches influenced stock prices with a simple visual data analysis.
#
# TODO: FIXME: Replace all pipes!!
# Some options for quantmod package
# TODO: ?
options("getSymbols.warning4.0" = F)
To analyse the influence of Mr. Hayek’s speeches on the SWATCH stock
price, we start by getting stock data for SWATCH.
SWATCH stock data is publicly available from Yahoo
Finance and we use the R quantmod package, which
offers a simple and convenient interface for getting stock price data.
All that is required to download the data is the ticker
( = “UHRN.SW”) of the corresponding financial
stock.
getSymbols(Symbols = "UHRN.SW",
src = "yahoo",
verbose = F)
The downloaded stock price data is now available as a data.frame
object by calling UHRN.SW. The SWATCH stock
data looks like this, with daily observations for each
trading day organised in the rows and seven different
variables, also called features in the data science context, in
the columns.
head(UHRN.SW)
## UHRN.SW.Open UHRN.SW.High UHRN.SW.Low UHRN.SW.Close UHRN.SW.Volume
## 2007-01-03 55.00 55.20 54.80 55.15 204994
## 2007-01-04 54.95 55.25 54.75 55.25 186117
## 2007-01-05 55.00 55.25 54.15 54.20 182313
## 2007-01-08 54.05 54.90 54.00 54.65 215321
## 2007-01-09 54.65 54.80 54.20 54.55 95947
## 2007-01-10 54.20 54.50 53.90 54.45 152143
## UHRN.SW.Adjusted
## 2007-01-03 39.80366
## 2007-01-04 39.87583
## 2007-01-05 39.11801
## 2007-01-08 39.44279
## 2007-01-09 39.37062
## 2007-01-10 39.29844
For each of the daily 4’171 observations we have the corresponding
date in the Date column, the Openning stock
price at trading start on the exchange, the daily Highest
and Lowest price, the Close at end of trading,
the trading Volume, and finally an Adjusted
price, accounting for stock splits, dividends, and similar corporate
actions. We later transform the data.frame into a better readable
format.
Second, to have a comparable benchmark to the SWATCH stock, we also get SPI (Swiss Performance Index) data (ticker = “SPICHA.SW” through the UBS ETF CH SPI) from Yahoo Finance.
getSymbols(Symbols = "SPICHA.SW",
src = "yahoo",
verbose = F)
Get Apple stock data (hint: ticker = “AAPL”).
getSymbols(Symbols = "AAPL",
src = "yahoo",
verbose = F)
The second ingredient for our analysis is data on Mr. Hayek’s public speeches. Data on Mr. Hayek’s speeches will allow us to, first, determine when and where a public speech by Mr. Hayek took place, and second, what the content of the speech was. Six of Mr. Hayek’s speeches are publicy available on the SWATCH Group website. We thus start by setting the base URL for the SWATCH group website.
http_link <- "https://www.swatchgroup.com/en/"
Each of the six speeches of the website can be found under a different URL, which we set in a new data.frame/tibble. Instead of laboriously downloading and parsing also the dates and places of the speeches by writing an automatic script, it is faster to quickly copy and paste that information by hand from the website. The following data.frame/tibble already holds this information.
df_URL_speches <-
tibble(Date = c(ymd("2010-04-10"), ymd("2010-03-05"), ymd("2009-08-24"),
ymd("2009-05-27"), ymd("2009-03-03"), ymd("2008-09-05")),
Place = c("Paris, France", "PSI Colloquim Villingen, Switzerland", "Interlaken, Switzerland",
"Berne, Switzerland", "Berne, Switzerland", "Baden, Switzerland"),
URL_speeches = c("nicolas-g-hayek-sorbonne",
"eve-renewable-energy-age",
"nicolas-g-hayeks-speech-swiss-ambassadors",
"happy-birthday-csem",
"nicolas-g-hayek-about-switzerland-and-european-union",
"economy-day-swiss-business-federation-economiesuisse"))
df_URL_speches
We can now combine a vector holding URLs for each of the six speeches by concatenating the base URL and the six speech URLs.
v_http_links_final <-
paste(http_link, df_URL_speches$URL_speeches, sep = "")
Since actually downloading, parsing, and cleaning the text data for Mr. Hayek’s speeches is slightly more involved, we can simply use the pre-written functions below. There’s no need to understand the functions, their only purpose is to download the speeches in HTML-format and get a clean text version of those speeches in a data.frame/tibble.
# Define `read_csv_confd` function to read in speeches from URLs
read_csv_confd <-
function(path) {
read_csv(file = path, col_names = "HTML")
}
Let’s request Hayek’s speeches from the SWATCH Group website and save them as a list of tibbles.
l_df_Hayek_speeches <-
lapply(v_http_links_final,
read_csv_confd)
We continue with transforming the gathered stock price data into an easy-to-use format. This step in data analysis is often called data wrangling and transforms the previously gathered stock price and speech data into the desired format for further analysis.
#
We want a tibble as primary data object for our
analysis. Tibbles are enhanced data.frames,
available in the dplyr (part of tidyverse) and
tsbox R packages. They provide a standardised way of
storing data from varying sources. I also use the
|> syntax, to make the programming code
easier to read (see picture below for a short explanation of the
|> syntax).
df_data_SWATCH <-
UHRN.SW |>
ts_tbl() |>
ts_wide() |>
rename(Date = time,
Open = UHRN.SW.Open,
High = UHRN.SW.High,
Low = UHRN.SW.Low,
Close = UHRN.SW.Close,
Volume = UHRN.SW.Volume,
Adjusted = UHRN.SW.Adjusted)
Let’s quickly look at the SWATCH stock price data. The data formating is now much better and the features easier to read and work with.
df_data_SWATCH
We do the same for the SPI data.
df_data_SPI <-
SPICHA.SW |>
ts_tbl() |>
ts_wide() |>
rename(Date = time,
Open = SPICHA.SW.Open,
High = SPICHA.SW.High,
Low = SPICHA.SW.Low,
Close = SPICHA.SW.Close,
Volume = SPICHA.SW.Volume,
Adjusted = SPICHA.SW.Adjusted)
Finally, we combine both SWATCH stock price and SPI time series to have them available in a single tibble.
df_data_SWATCH_SPI <-
df_data_SWATCH |>
full_join(df_data_SPI,
by = "Date",
suffix = c("_SWATCH", "_SPI"))
We can now compute returns for both stocks.
df_data_SWATCH_SPI <-
df_data_SWATCH_SPI |>
mutate(Returns_SWATCH = Adjusted_SWATCH / lag(Adjusted_SWATCH) - 1,
Returns_SPI = Adjusted_SPI / lag(Adjusted_SPI) - 1)
Turn Apple stock data into a tibble with appropriate format.
df_Apple_data <-
AAPL |>
ts_tbl() |>
ts_wide() |>
rename(Date = time,
Open = AAPL.Open,
High = AAPL.High,
Low = AAPL.Low,
Close = AAPL.Close,
Volume = AAPL.Volume,
Adjusted = AAPL.Adjusted)
Again, we can use the pre-written functions below to transform the data on Mr. Hayek’s speeches. These functions help to clean the text version of those speeches.
# Define function `detect_HTML_paragraph_indices` to detect start and end of speeches in HTMLs
detect_HTML_paragraph_indices <- function(tibble) {
df_HTML_paragraph_indices <-
tibble |>
mutate(HTML_paragraph = str_detect(HTML, pattern = "^<p>")) |>
summarise(HTML_paragraph_index = which(HTML_paragraph),
HTML_paragraph_index_first = first(HTML_paragraph_index),
HTML_paragraph_index_last = last(HTML_paragraph_index)) |>
distinct(HTML_paragraph_index_first,
HTML_paragraph_index_last)
return(df_HTML_paragraph_indices)
}
# Define function `extract_HTML_text` to clean speech texts from HTML and other parts
extract_HTML_text <- function(tibble, text_index_start, text_index_end) {
df_HTML_text <-
tibble |>
slice(text_index_start:text_index_end) |>
mutate(HTML = str_replace_all(HTML,
pattern = c("^<.{1,5}>" = "",
"<.{1,5}>$" = "",
"<div.*>" = "",
"<strong>.*</strong>" = "",
" " = "",
"<img.*>" = "",
"<em.*>" = "",
"<.*>" = ""))) # Recheck this line
return(df_HTML_text)
}
# Detect start and endings of Hayek's speeches in HTMLs
df_Hayek_speeches_HTML_paragraph_indices <-
map_dfr(l_df_Hayek_speeches,
detect_HTML_paragraph_indices)
# l_a <- list(df = l_df_Hayek_speeches,
# start = df_Hayek_speeches_HTML_paragraph_indices$HTML_paragraph_index_first,
# end = df_Hayek_speeches_HTML_paragraph_indices$HTML_paragraph_index_last)
# pmap(,
# extract_HTML_text(tibble = df, text_index_start = start, text_index_end = end))
# Clean Hayek's speeches
# TODO: Improve programming! Remove empty lines?
l_df_Hayek_speeches_clean <- list(NA, NA, NA, NA, NA, NA)
for (index in 1:nrow(df_Hayek_speeches_HTML_paragraph_indices)) {
l_df_Hayek_speeches_clean[[index]] <-
extract_HTML_text(tibble = l_df_Hayek_speeches[[index]],
text_index_start = df_Hayek_speeches_HTML_paragraph_indices$HTML_paragraph_index_first[index],
text_index_end = df_Hayek_speeches_HTML_paragraph_indices$HTML_paragraph_index_last[index])
}
names(l_df_Hayek_speeches_clean) <-
df_URL_speches$Date
The speeches are saved in a list of tibbles and can be accessed
through the index number of the speech
l_df_Hayek_speeches_clean[[index]] in the list object. We
could, for example, look at the second of Mr. Hayek’s speeches.
l_df_Hayek_speeches_clean[[2]]
After cleaning and transforming Mr. Hayek’s speech data, we proceed with a simple text analysis of the speech content. We start by splitting one of Mr. Hayek’s speeches into single words.
df_Hayek_speeches_words <-
l_df_Hayek_speeches_clean[[2]] |>
unnest_tokens(input = HTML,
output = word,
token = "words",
format = "html",
to_lower = T,
drop = T)
Additionally, a good recommendation is to remove stop words, ‘the’, ‘a’, etc., because they carry less meaning than verbs, nouns, adjectives, etc.
df_Hayek_speeches_words_wo_stop_words <-
df_Hayek_speeches_words |>
anti_join(stop_words,
by = "word")
We may now count the word frequency and keep the 15 most frequent words.
df_Hayek_speeches_words_wo_stop_words_counted <-
df_Hayek_speeches_words_wo_stop_words |>
count(word, sort = T) |>
slice(1:15)
Now we’re ready to create our first graph, a line or
time series chart. We use the SWATCH’s
stock price data and the ggplot plotting engine. We need
the previously mentioned Grammar of Graphics to set up
each specific component in the plot. First, we need to
map the data to so-called
aesthetics in the plot. For a visual overview and
corresponding explanations of the different components in
ggplot’s Grammar of Graphics, see this Towards
Data Science article:
Aesthetics are defined within the aes() function in
ggplot and include plot specifications such as what goes on
the x-axis and y-axis, what is shown in which colour, how the size of an
object in a plot is determined and many more. For our basic time series
plot, we simply map the Date column from the stock data to
the x-axis and the Adjusted stock price to the
y-axis. The only additional component to add to get a finished
plot now is a so-called geom (short for geometric
objects). Geoms determine the kind of plot we want to display and are
added with the set of geom_... functions. Here, we’d like
to create a simple line plot with
geom_line(). First, we add a new component to the plot by
using the + operator, which separates each of the
components of the plot. Then we set the line geom and, after saving the
plot to a new R object, we have our first plot.
p_basic_time_series_SWATCH <-
ggplot(data = df_data_SWATCH,
aes(x = Date, y = Adjusted)) + # Close
geom_line()
p_basic_time_series_SWATCH
Create a time series plot for Apple’s stock price. You can also try to adjust the axis scales, in case you have an idea how to do it.
df_Apple_data |>
ggplot(aes(x = Date, y = Adjusted)) +
geom_line() +
scale_x_date(date_breaks = "1 year",
date_labels = "%Y") +
scale_y_continuous(labels = scales::dollar,
breaks = seq(from = 0, to = max(df_Apple_data$Adjusted, na.rm = T), by = 20))
So far, so good. This is what we get by using ggplot’
default settings. However, the plot doesn’t look
particularly great, does it? The grey background is rather irritating,
the date on the x-axis is only displayed every five years, it’s unclear
in what units the y-axis is measured, and in general, there’s no title
or anything to really indicate what is exactly shown here. The only
information we have is the evolution of the series over a time period of
10 years and its corresponding values on the y-axis. So we need to
adjust some basic components of the plot.
Since we already defined our data and aesthetics
components, we start by adjusting the scales of the x- and
y-axes in a new component, the scales component. This
ensures, we get proper units and labels for the x- and y-axis. We copy
the ggplot object and code from above and additionally add
scale_x_... and scale_y_... functions with
proper arguments. The x-axis should be set to dates in year units and
the y-axis to a continious scale with USD units.
p_basic_time_series_SWATCH_w_scales <-
p_basic_time_series_SWATCH +
scale_x_date(date_breaks = "1 year",
date_labels = "%Y") +
scale_y_continuous(labels = scales::number_format(prefix = "USD "),
breaks = scales::pretty_breaks(n = 6))
p_basic_time_series_SWATCH_w_scales
The theme of a plot is yet another component in the
Grammar of Graphics. Setting a cleaner theme will help us to
get rid of the irritating grey background. Let’s try the
theme_classic() function.
p_basic_time_series_SWATCH_w_scales_and_theme <-
p_basic_time_series_SWATCH_w_scales +
theme_classic()
p_basic_time_series_SWATCH_w_scales_and_theme
Let’s create a line plot with the same theme and appropriately adjusted scales for Apple. Try adding a proper title to the plot.
p_time_series_Apple <-
df_Apple_data |>
ggplot(aes(x = Date, y = Adjusted)) +
geom_line() +
scale_x_date(date_breaks = "1 year",
date_labels = "%Y") +
scale_y_continuous(labels = scales::dollar,
breaks = seq(from = 0, to = max(df_Apple_data$Adjusted, na.rm = T), by = 20)) +
theme_classic() +
labs(title = "A Story of Success (and Steve Jobs)",
subtitle = "Apple's Stock Price",
y = "Close (Adjusted)",
caption = "© Matthieu Rüttimann")
p_time_series_Apple
theme_classic() is quite a clean and simplistic theme.
For the purpose of interpreting a time series plot, however, a theme
including a grid may be more appropriate. Thus, in the
following plots, we use theme_light() instead. By adding
additional theme elements, we make sure the grid lines stay
in the background of the plot by slightly fading them out, since they
are only meant as supporting the viewer in identifying the scales on the
axes. Next, we would also like to add a proper title.
Plot main and subtitles as well as axis-labels are set with the
labs() function. Next, we accentuate the x- and y-axis by
plotting it in thicker size than the background grid lines. Let’s also
adjust the label of the y-axis to make it clearer what
it represents. Finally, we add a caption with a copyright for the plot.
Now we have our first complete time series plot.
p_basic_time_series_SWATCH_w_scales_themed <-
p_basic_time_series_SWATCH_w_scales +
theme_light() +
theme(plot.title = element_text(face = "bold"), # Bold plot titles
axis.line = element_line(size = 0.75), # thicker axes
panel.grid.major = element_line(size = 0.05), # softer grid lines
panel.grid.minor = element_line(size = 0.05)) + # softer grid lines
labs(title = "Steady As a Ship...?",
subtitle = "SWATCH Group Stock Price (Ticker: UHRN.SW)",
y = "Close (Adjusted)",
caption = "© Matthieu Rüttimann")
p_basic_time_series_SWATCH_w_scales_themed
For the following plots, let’s set a global default
ggplot theme, instead of adding it manually to each
plot.
theme_set(theme_light())
To improve further on our plot, we can add a so-called
benchmark to it. A benchmark is, e.g., another time
series to compare the SWATCH stock price to. We use the previously
gathered SPI series to do exactly that. In order to
compare the stock prices of the two series directly to each other, a
rebasing of the prices to a specific time point is required. We choose
2011-07-18, since it is the first day with available
observations for the SPI in our data sample.
df_data_SWATCH_SPI_filtered <-
df_data_SWATCH_SPI |>
filter(Date >= "2011-07-18") |>
mutate(Adjusted_SWATCH_Rebased = Adjusted_SWATCH / first(Adjusted_SWATCH),
Adjusted_SPI_Rebased = Adjusted_SPI / first(Adjusted_SPI))
df_scale_date <-
df_data_SWATCH_SPI_filtered |>
summarise(scale_date_min = min(Date, na.rm = T),
scale_date_max = max(Date, na.rm = T) + 250)
Let’s first create a rebased line graph of the SWATCH stock, starting
on 2011-07-18. In addition to the prior graphs, we add a
geom_hline element to indicate the 100%-y-line.
p_time_series_SWATCH_vs_SPI <-
df_data_SWATCH_SPI_filtered |>
ggplot(aes(x = Date)) +
geom_hline(yintercept = 1, size = 1.5, col = "grey", alpha = 0.5) +
geom_line(aes(y = Adjusted_SWATCH_Rebased), col = col_palette_red[6]) +
geom_point(aes(x = last(Date),
y = last(Adjusted_SWATCH_Rebased)),
col = col_palette_red[6],
shape = 3,
size = 2) +
scale_x_date(date_breaks = "1 year",
date_labels = "%Y",
limits = c(df_scale_date$scale_date_min, df_scale_date$scale_date_max)) +
scale_y_continuous(labels = scales::percent,
breaks = scales::pretty_breaks(n = 8)) +
labs(title = "Was it SWATCH's Time to Perform?",
subtitle = "SWATCH Stock Price vs. SPI Benchmark",
y = "Price Rebased (%)",
caption = "© Matthieu Rüttimann") +
theme(legend.text = element_text(),
plot.title = element_text(face = "bold"),
axis.line = element_line(size = 0.75),
panel.grid.major = element_line(size = 0.05),
panel.grid.minor = element_line(size = 0.05))
p_time_series_SWATCH_vs_SPI
Now we take the above graph and add the SPI benchmark, as well as text labels.
p_time_series_SWATCH_vs_SPI <-
p_time_series_SWATCH_vs_SPI +
geom_line(aes(y = Adjusted_SPI_Rebased), col = col_palette_green[7]) +
geom_point(aes(x = last(Date),
y = last(Adjusted_SPI_Rebased)),
col = col_palette_green[7],
shape = 3,
size = 2) +
geom_text(label = "SWATCH",
aes(x = last(Date),
y = last(Adjusted_SWATCH_Rebased)),
color = col_palette_red[7],
size = 2.5,
hjust = -0.3) +
geom_text(label = "SPI",
aes(x = last(Date),
y = last(Adjusted_SPI_Rebased)),
color = col_palette_green[8],
size = 2.5,
hjust = -0.3)
p_time_series_SWATCH_vs_SPI
Apparently, SWATCH underperformed in comparison to the SPI over the time period from 2011 to 2023. In particular beginning around June 2014, the price for SWATCH declined rather sharply in comparison to the SPI. Over the entire time period, SWATCH equities slightly lost in values while the SPI rose by over 200%.
Finally, we can annotate a background highlighting the
time period when the two price series started to diverge. We can do this
with the annotate geom. Highlighting areas or specific
parts of a chart is a useful element in story telling with
data (while engaging titles, proper labels, and colours are
another part). A general suggestion is to use colours for specific
messages we want to convey to our audience. People usually associate red
colours with negative sentiments and green colours with positive ones,
even subconsciously. In addition, in particular red colours are usually
the first thing the eye picks up when looking at a graph. Finally,
except for the colour encoding and visual elements capturing our
attention, most people read a graph from left to right, up to down.
p_time_series_SWATCH_vs_SPI +
annotate(geom = "rect",
xmin = as.Date("2014-06-01"),
xmax = as.Date("2018-07-01"),
ymin = -Inf,
ymax = Inf,
col = "grey",
alpha = 0.05) +
annotate(geom = "text",
label = "Divergence",
x = as.Date("2016-07-01"),
y = 2.5,
col = col_palette_grey[7],
size = 4)
Let’s try adding a title, subtitle, and some text or line annotations to our Apple chart as an exercise. Titles and text annotations are a great way to tell a story in a graph.
text_Apple <- "…Nevertheless, \n some bumps \n occured along \n the road"
p_time_series_Apple +
labs(title = "Apple Fared Pretty Well…",
subtitle = "Apple's Stock Price Over 16 Years",
y = "Stock Price") +
geom_text(x = as.Date("2018-07-01"),
y = 80,
color = col_palette_blue[4],
label = text_Apple,
size = 3)
One of the most common – and useful – graphs is a bar
chart. Let’s create one. We need a new geom_col
for that purpose. Additionally, we reorder the x-aesthetics and fill
arguments by the word frequency n with
fct_reorder, adding the argument .desc = T to
reverse the ordering.
p_Hayek_speech_words_bar_chart <-
df_Hayek_speeches_words_wo_stop_words_counted |>
ggplot(aes(x = fct_reorder(word, n, .desc = T), y = n, fill = fct_reorder(word, n, .desc = T))) +
geom_col(alpha = 0.9) +
scale_fill_viridis_d(name = "Words",
direction = -1) +
labs(title = "How Does Hayek Speak? What Words Does He Use?",
subtitle = "Speech at PSI Colloquim Villingen, Switzerland on 03.05.2010, by Nicolas Hayek",
x = "Words",
y = "Word Frequency") +
theme(axis.text.x = element_text(angle = 60)) # Turn x-axis labels by 60 degrees
p_Hayek_speech_words_bar_chart
We can add the frequency also as text with geom_text to
make the graph more legible.
p_Hayek_speech_words_bar_chart <-
p_Hayek_speech_words_bar_chart +
geom_text(aes(y = n + 2.5, label = n),
size = 2.5)
p_Hayek_speech_words_bar_chart
A hot tip is to invert bar charts, which makes them easier to read and visually more appealing.
p_Hayek_speech_words_bar_chart_inverted <-
df_Hayek_speeches_words_wo_stop_words_counted |>
ggplot(aes(x = fct_reorder(word, n), y = n, fill = fct_reorder(word, n))) +
geom_col(alpha = 0.9) +
geom_text(aes(y = n + 2.5, label = n),
size = 2.5) +
coord_flip() +
scale_fill_viridis_d(name = "Words",
direction = 1) +
labs(title = "How Does Hayek Speak? What Words Does He Use?",
subtitle = "Speech at PSI Colloquim Villingen, Switzerland on 03.05.2010, by Nicolas Hayek",
x = "Words",
y = "Word Frequency")
p_Hayek_speech_words_bar_chart_inverted
Finally, we’ll try to answer our question whether Nicolas Hayek’s speeches influenced SWATCH’s stock price? Hence, we’ll analyse the relation between Hayek’s speeches and SWATCH’s stock price data.
Disclaimer: We do not have intraday data here and financial markets usually react within a few minutes or seconds to unanticipated news. Hence, to scientifically and accurately evaluate whether Mr. Hayek’s speeches had an influence on SWATCH’s stock price, we’d need to look at intraday tick data, not daily data. So the analysis of this section is only exemplary.
We combine stock and speech data.
df_data_SWATCH_SPI_speeches <-
df_data_SWATCH_SPI |>
left_join(df_URL_speches,
by = "Date")
We can take one of our prior plots of SWATCH’s stock price and add
red lines with geom_vline which indicate each date of
Hayek’s six speeches.
p_basic_time_series_SWATCH_w_scales_themed +
geom_vline(xintercept = df_URL_speches$Date,
color = "red",
alpha = 0.6)
However, stock prices themselves are not very indicative for the influence which the speeches had. Although one may think that the first of Hayek’s speeches led to a sharp decline of SWATCH’s stock price, the stock market usually soaks up information extremely quickly and the down turn trend could simply occur by chance at the same time as the speech.
We need to look at returns instead of the price to answer our question. One possible way of analysis is to have a look at whether stock returns on the day of one of the speeches was higher than on similar days occurring shortly before or after such a speech.
df_data_SWATCH_SPI_speeches <-
df_data_SWATCH_SPI_speeches |>
mutate(speech_day = if_else(!is.na(URL_speeches), "Yes", "No"))
df_data_SWATCH_SPI_speeches |>
ggplot(aes(x = Date, y = Returns_SWATCH, col = speech_day)) +
geom_hline(yintercept = 0,
col = col_palette_grey[5],
alpha = 0.5,
size = 2.5) +
geom_point(alpha = 0.6) +
geom_vline(xintercept = df_URL_speches$Date,
color = "red",
alpha = 0.6) +
scale_x_date(breaks = scales::pretty_breaks(n = 6),
limits = c(min(df_URL_speches$Date) - 50,
max(df_URL_speches$Date) + 50)) +
scale_y_continuous(labels = scales::percent,
breaks = scales::pretty_breaks(n = 6)) +
scale_color_manual(name = "Speech Day",
values = c("Yes" = col_palette_red[7],
"No" = "black")) +
labs(title = "Did Mr. Hayek's Speeches Lead to Higher Stock Returns?",
subtitle = "Daily SWATCH Returns and Hayek's Speeches, Red Lines Indiacte Speeches",
y = "Stock Returns") +
theme(legend.position = "bottom")
Looking at the graph above, it does not appear that stock returns on days of a speech (in red) where higher than on normal days.
An even better way to analyse whether the speeches had an influence
is to look at the range of daily stock prices instead.
We define the range here as the High - Low of
daily stock prices, which we already have as features in our
data sample.
df_data_SWATCH_SPI_speeches <-
df_data_SWATCH_SPI_speeches |>
mutate(range = High_SWATCH - Low_SWATCH)
df_data_SWATCH_SPI_speeches |>
ggplot(aes(x = Date, y = range, col = speech_day)) +
geom_hline(yintercept = 0,
col = col_palette_grey[5],
alpha = 0.5,
size = 2.5) +
geom_point(alpha = 0.6) +
geom_vline(xintercept = df_URL_speches$Date,
color = "red",
alpha = 0.6) +
scale_x_date(limits = c(min(df_URL_speches$Date) - 50,
max(df_URL_speches$Date) + 50),
breaks = scales::pretty_breaks(n = 6)) +
scale_y_continuous(labels = scales::number_format(prefix = "$ "),
breaks = scales::pretty_breaks(n = 6)) +
scale_color_manual(name = "Speech Day",
values = c("Yes" = col_palette_red[7],
"No" = "black")) +
labs(title = "Did Mr. Hayek's Speeches Lead to Higher Stock Returns?",
subtitle = "Daily SWATCH High-Low Range and Hayek's Speeches, Red Lines Indiacte Speeches",
y = "Daily High - Low (Range of Stock Returns)") +
theme(legend.position = "bottom")
Again, it appears that Hayek’s speeches did not have a particularly strong relation with the range of stock prices.
Hence, after our simple data analysis, we can conclude that Nicolas Hayek speeches likely did not have an influence on SWATCH’s stock price.
Sources: Swissinfo.ch and Swatch Group Website↩︎